Approximate Sparse Linear Regression

نویسندگان

  • Sariel Har-Peled
  • Piotr Indyk
  • Sepideh Mahabadi
چکیده

In the Sparse Linear Regression (SLR) problem, given a d×n matrix M and a d-dimensional vector q, we want to compute a k-sparse vector τ such that the error ‖Mτ − q‖ is minimized. In this paper, we present algorithms and conditional lower bounds for several variants of this problem. In particular, we consider (i) the Affine SLR where we add the constraint that ∑ i τi = 1 and (ii) the Convex SLR where we further add the constraint that τ ≥ 0. Furthermore, we consider (i) the batched (offline) setting, where the matrixM and the vector q are given as inputs in advance, and (ii) the query (online) setting, where an algorithm preprocesses the matrix M to quickly answer such queries. All of the aforementioned variants have been well-studied and have many applications in statistics, machine learning and sparse recovery. We consider the approximate variants of these problems in the "low sparsity regime" where the value of the sparsity bound k is low. In particular, we show that the online variant of all three problems can be solved with query time Õ(nk−1). This provides non-trivial improvements over the naive algorithm that exhaustively searches all ( n k ) subsets B. We also show that solving the offline variant of all three problems, would require an exponential dependence of the form Ω̃(n/e), under a natural complexity-theoretic conjecture. Improving this lower bound for the case of k = 4 would imply a nontrivial lower bound for the famous Hopcroft’s problem. Moreover, solving the offline variant of affine SLR in o(nk−1) would imply an upper bound of o(n) for the problem of testing whether a given set of n points in a d-dimensional space is degenerate. However, this is conjectured to require Ω(n) time. We also present algorithms for some special cases by exploiting the specific structures of the problems. Last but not least, our algorithms involve formulating and solving several interesting subproblems that might find applications in other areas.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Estimation in Linear Regression with Molticollinearity and Sparse Models

‎One of the factors affecting the statistical analysis of the data is the presence of outliers‎. ‎The methods which are not affected by the outliers are called robust methods‎. ‎Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers‎. ‎Besides outliers‎, ‎the linear dependency of regressor variables‎, ‎which is called multicollinearity...

متن کامل

Gaussian Kullback-Leibler approximate inference

We investigate Gaussian Kullback-Leibler (G-KL) variational approximate inference techniques for Bayesian generalised linear models and various extensions. In particular we make the following novel contributions: sufficient conditions for which the G-KL objective is differentiable and convex are described; constrained parameterisations of Gaussian covariance that make G-KL methods fast and scal...

متن کامل

Automatic hard thresholding for sparse signal reconstruction from NDE measurements

We propose an automatic hard thresholding (AHT) method for sparse-signal reconstruction. The measurements follow an underdetermined linear model, where the regression-coefficient vector is modeled as a superposition of an unknown deterministic sparse-signal component and a zero-mean white Gaussian component with unknown variance. Our method demands no prior knowledge about signal sparsity. Our ...

متن کامل

Adaptive Posterior Mode Estimation of a Sparse Sequence for Model Selection

For the problem of estimating a sparse sequence of coefficients of a parametric or nonparametric generalized linear model, posterior mode estimation with a Subbotin(λ, ν) prior achieves thresholding and therefore model selection when ν ∈ [0, 1] for a class of likelihood functions. The proposed estimator also offers a continuum between the (forward/backward) best subset estimator (ν = 0), its ap...

متن کامل

Gene regulatory network inference using sparse probabilistic models

The main task of systems biology is to uncover mechanisms that regulate complex processes that take place in biological cells, especially the mechanisms of gene regulation. This project aims to identify gene regulatory interactions taking place in the early development of neural tube. Solutions proposed in this work for identification of transcription factors and their target genes are mostly b...

متن کامل

Hardness of Approximation for Sparse Optimization with L0 Norm

In this paper, we consider sparse optimization problems with L0 norm penalty or constraint. We prove that it is strongly NP-hard to find an approximate optimal solution within certain error bound, unless P = NP. This provides a lower bound for the approximation error of any deterministic polynomialtime algorithm. Applying the complexity result to sparse linear regression reveals a gap between c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.08739  شماره 

صفحات  -

تاریخ انتشار 2016